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Abstract — This paper proposes a framework dedicated to the construction of what we call discrete elastic inner product allowing one 
to embed sets of non-uniformly sampled multivariate time series or sequences of varying lengths into inner product space structures. 
This framework is based on a recursive definition that covers the case of multiple embedded time elastic dimensions. We prove that 
such inner products exist in our general framework and show how a simple instance of this inner product class operates on some 
prospective applications, while generalizing the Euclidean inner product. Classification experimentations on time series and symbolic 
sequences datasets demonstrate the benefits that we can expect by embedding time series or sequences into elastic inner spaces 
rather than into classical Euclidean spaces. These experiments show good accuracy when compared to the euclidean distance or even 
dynamic programming algorithms while maintaining a linear algorithmic complexity at exploitation stage, although a quadratic indexing 
phase beforehand is required. 

Index Terms — Vector space, Discrete time series, Sequence mining, Non-uniform sampling, Elastic inner product, Time warping. 
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1 Introduction 

TIME series analysis in metric spaces has attracted 
much attention over numerous decades and in 
various domains such as biology, statistics, sociology, 
networking, signal processing, etc, essentially due to 
the ubiquitous nature of time series, whether they are 
symbolic or numeric. Among other characterizing tools, 
time warp distances (see [20J, [13], and more recently 0, 
[9J among other references) have shown some interesting 
robustness compared to the Euclidean metric especially 
when similarity searching in time series data bases is an 
issue. Unfortunately, this kind of elastic distance does 
not enable direct construction of definite kernels which 
are useful when addressing regression, classification 
or clustering of time series. A fortiori, they do not 
make it possible to directly construct inner products 
involving some time elasticity, which are basically able to 
cope with some stretching or some compression along 
specific dimension. Recently, [10J have shown that it is 
quite easy to propose inner product with time elasticity 
capability at least for some restricted time series spaces, 
basically spaces containing uniformly sampled time 
series, all of which have the same lengths (in such cases, 
time series can be embedded easily in Euclidean spaces). 
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The aim of this paper is to derive an extension from 
this preliminary work for the construction of time 
elastic inner products, to achieve the construction of an 
elastic inner product structure for a quasi-unrestricted 
set of sequential data or time series, i.e. sets for which 
the data is not necessarily uniformly sampled and may 
have any lengths. Section two of the paper gives the 
main notations used throughout this paper and presents 
a recursive construction for inner-like products. It then 
gives the conditions and the proof of existence of elastic 
inner products (and elastic vector spaces) defined on 
a quasi-unrestricted set of time series while explaining 
what we mean by quasi-unrestricted. The third section 
succinctly presents some preliminary applications, 
mainly to highlight some of the features of elastic inner 
product vector spaces such as orthogonality. The fourth 
section presents two effective experimentations, the 
first one relating to time series classification and the 
second one addressing symbolic sequences classification. 



2 Elastic Inner Product Vector Spaces 

Starting from the recursive structure of the Dynamic 
Time Warping (DTW) equation in which the non linear 
Max operator is replaced by a Sum operator, we propose 
a quite general and parameterized recursive equation 
that we call elastic product. Our goal in this section is 
to derive the conditions for which such elastic product is 
an inner product that maintains a form of time elasticity. 

2.1 Sequence and sequence element 

Definition 2.1. Given a finite sequence A we denote by 
A(i) the i th element (symbol or sample) of sequence A. 
We will consider that A(i) e S x T where (S,®s,®s) 
is a vector space that embeds the multidimensional 



JOURNAL OF I5T E X CLASS FILES, MARCH 2012 



2 



space variables (e.g. S C R d , with d e N+) and T c R 
embeds the timestamps variable, so that we can write 
A(i) = {a(i),t a u\) where a(i) G 5 and e T, with the 
condition that t a t^ > t a u\ whenever i > j (timestamps 
strictly increase in the sequence of samples). A\ with 
i < 3 is the subsequence consisting of the %th through the 
jth element (inclusive) of A. So A\ = A(i)A(i + l)...A(j). 
A denotes the null element. By convention A\ with i > j 
is the null time series, e.g. fi. 



2.2 Sequence set 

Definition 2.2. The set of all finite discrete time series 
is thus embedded in a spacetime characterized by a 
single discrete temporal dimension, that encodes the 
timestamps, and any number of spatial dimensions 
that encode the value of the time series at a given 
timestamp. We note U = {A\\p € N} the set of all finite 
discrete time series. A\ is a time series with discrete 
index varying between 1 and p. We note Vt the empty 
sequence (with null length) and by convention A\ = CI 
so that is a member of set ILL \A\ denotes the length 
of the sequence A. Let U p = {A e U | \A\ < p} be the 
set of sequences whose length is shorter or equal to p. 
Finally let U* be the set of discrete time series defined 
on (S — {0s}) x T, i.e. the set of time series that do not 
contain the null spatial value. We denote by 0s the null 
value in S. 



2.3 Scalar multiplication on U* 

Definition 2.3. For all A e V* and all A e R, C = A <x> A 
€ U* is such that for all i € N such that < i < \A\, 
C{i) = (A.a(i),* o(i) ) and thus \C\ = \A\. 



2.4 Addition on U* 

Definition 2.4. For all (A,B) e (U*) 2 , the addition of A 
and B, noted C = A@B E IP, is defined in a constructive 
manner as follows: let i,j and k be in N. 

k = i = j = 1, 

As long as 1 < i < \A\ and 1 < j < \B\, 
if t a (i) < h(j), C{k) = (a(i),t o(i )) and % <- i + l,k <- 
k + 1 

else if t a(i) > t b(j) , C(k) = {b{j),t bU} ) and j <- j + 
l,k <- k + 1 

else if a{i) + b(j) 0, C(k) = (a(i) + b{j),t a{l) ) and 
i^i + l,j^j + l,k^~k+l 
else i «— i + 1, j <— j + 1 

Three comments need to be made at this level to clarify 
the semantic of the operator ©: 
i) Note that the addition of two time series of equal 
lengths and uniformly sampled coincides with the 
classical addition in vector spaces. Fig. [T] gives an 



example of the addition of two time series that 
are not uniformly sampled and that have different 
lengths, except that zero values are discarded to 
ensure that the sum of two time series will remain 
a member of U*. 

ii) Implicitly (in light of the last case described in Def . 
I2.4|l , any sequence element of the sort (0g, t), where 
0s is the null value in S and t 6 T must be assim- 
ilated to the null sequence element A. For instance, 
the addition of A = (1, 1)(1, 2) with B = (-1,1)(1,2) 
is C = A 8 B = (2, 2): the addition of the two first 
sequence elements is (0, 1) that is assimilated to A 
and as such suppressed in C. 

iii) The © operator, when restricted to the set U* is re- 
versible in that if C = A®B then A = C®{{—1)®B) 
or B = C © ((—1) <8> A). This is not the case if we 
consider the entire set U. 

2.5 Elastic product (ep) 

Definition 2.5. A function < .,. >: U* x U* -> R 
is called an Elastic Product if, there exists a function 
/ : S 2 — >• R, a strictly positive function g : T 2 -> R + and 
three constants a, (3 and £ in R such that, for any pair 
of sequences A\,B\, the following recursive equation 
holds: 

< A^, B\ > e p= 

r a - <a\-\b\ > ep 

£< p-KA^.Bt 1 > ep +f(a(p),b(q))-g(t a(p) ,t b{q) ) 
{ a-<A{,Br l > ep 

This recursive definition requires defining an 
initialization. To that end we set, VA 6 U*, 
< A, Q > ep =< 0, A > ep =< fl,Q > ep = where £ 
is a real constant (typically we set £ = 0), and fl is 
the null sequence, with the convention that A^ 3 = f2 
whenever i > j. 

This paper addresses the most interesting question of 
the existence of elastic inner products on the set U*, i.e. 
without any restriction on the lengths of the considered 
time series nor the way they are sampled. If the choice 
of functions / and g, although constrained, is potentially 
large, we show hereinafter that the choice for constants 
a, f3 and £ is unique. 

2.6 Existence of elastic inner products(eip) defined 

on U* 

Theorem 2.1. < ., . > ep is an inner product on (U*,ffi, ®) 
iff: 

i) £ = 0. 

ii) j:(TxT)->Mis symmetric and strictly positive, 

iii) f is an inner product on (S,®s,®s), if we extend the 
domain of f on S while setting f(0s, 0s) = 0. 

iv) a. — 1 and [3 — — 1, 
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Fig. 1 . The © binary operator when applied to two discrete time series of variable lengths and not uniformly sampled. 
Co-occurring events have been slightly separated at the top of the figure for readability purposes. 



2. 6. 1 proof of theorem \2J\ 
Proof of the direct implication 

Let us suppose first that < ., . > ep is an inner product 
defined on U*. Then since < ., . > ep is positive-definite 
necessarily < > ep — £ = 0. Furthermore, for any 

A 1 = (a,ti) and A 2 = (a,t 2 ) € U* with a ^ S , 

< Ax,A 2 > ep = g(t 1 ,t 2 ).f(a,a) ^ 0. Since < .; . > ep 
is symmetric, we get that g(ti,t2) = g(t2,ti) for any 
(ti,h) € T 2 which establishes that g is symmetric. Since 
g is strictly positive by definition of < ., . > epi i) and ii) 
are satisfied. 

For any A = (a,t a ) G U* 7 < A, A > ep = 
f(a,a)g(t a ,t a ) > 0. Since g is strictly positive, then 
we get that f(a,a) > 0. If we set f(0 s ,0s) = 0, we 
establish that / is positive-definite on S. 

Since £ = 0, for any A, B, and C e U* such that 

A = (a,t), B(b,t) and C = (c, t c ), we have: 

< A © B, C > ep = f(a ffi s b, c).g(t, t c ). 

As < A © B, C > ep =< A, C > ep + < B,C > ep 

= f(a,c).g(t,t c ) + f(b,c).g(t,t c ) = (f(a,c) + 

f(b, c)).g(t,t c ), 

As g is strictly positive, we get that f(a ffig b, c) = 

(f(a,c) + f(b,c)). 

Furthermore, < A © A, C > ep — /(A ©s a, c).g(t, t c ). 

As < A © A, C > ep = A. < A,C > ep = A. /(a, c).g(t, t c ) and 

g is strictly positive, we get that /(A ©s a, c) = A. /(a, c). 



This shows that / is linear, symmetric and positive- 
definite. Hence it is an inner product on (S, ©s,©s) 
and iii) is satisfied. 

Let us show that necessarily a — 1 and (3 = — 1. To 
that end, let us consider any A^Bf and C[ in U*, such 
that p > l,q > l,r > 1 and such that t a i p \ < %fq\, ie. if 
Xf = A\ © B\, then X 8 ^ 1 = A\ © B\~ x . 
Since by hypothesis < ., . > ep is an inner product 
(U*, ©, ©), it is linear and thus we can write: 

< A^ © B^ , > e p = < A^ , > ep + < £> ' , C\ > e p . 

Decomposing < A\@ B\,C{ > ep , we obtain: 

< A{ © BJ, C[ > ep = a. < Al © Sf" 1 , C[ > ep + 

/3. < ^ffiBr^r 1 > ep +/(&(<*), c(r)). 5 (i i)(g) ,t c(r) ) + 

As < ., . > ep is linear we get: 

< A\@BlC[ > ep = a. < A\,Cl > ep +a. < B q -\C{ > ep 
+ 

(3. < A^Cl' 1 > ep +13. < Sf-^Cr 1 > ep 

+f(b(q),c(r)).g(t b(q) , t c(r) ) + 

a. < A^C^ 1 > ep +a. < Bf,^- 1 > ep 

Hence, 

+ 

a. <Al,C{- 1 > ep + <Bf,Cl > ep 

If we decompose < A\ , C\ > ep , we get: 
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< A p ,®Bf,C{ > ep = (a 2 + /3 + a) < A?, C[ _1 > ep +a.0. < 
A?~ ,C[ _1 > ep +a./(a(p),c(r))..g(t a(p) ,i c(r) ) + a 2 . < 

Thus we have to identify < A\,C\ > ep — 

a. < ai,c{- 1 > ep +0. <_Ar i ,cr l >e P 

+f(a(p),c(r)).g(t a(p ),t c ( r) ) + a. < A\ ,C{ > ep 

with (a 2 +f3+a) < A{,C[~ l > ep +a./3. < Af 1 ^f 1 > ep 

+a./(a(p),c(r)).p(i a(p) ,i c(r) ) ffi a 2 . < A?~ \C[ > ep . 

The unique solution is a = 1 and /3 = — 1. That is if 

< .,. > ep is an existing inner product, then necessarily 
a = 1 and /? = — 1, establishing iv). 

Proof of the converse implication 

Let us suppose that i), ii), iii) and iv) are satisfied and 
show that < ., . > ep is an inner product on U*. 

First, by construction, since / and g are symmetric, 
so is < ., . > ep . 

It is easy to show by induction that < ., . > ep is non- 
decreasing with the length of its arguments, namely, 
VA? and Bf in U*, 

< A\,Bf > ep - < A^Bf 1 > ep > 0. Let n = p + q. 
The proposition is true at rank n = 0. It is also true if 
A\ = il, whatever Bf is, or B\ = f2, whatever < A\ is. 
Suppose it is true at a rank n > 0, and consider A\ ^ VI 
and Bf =/= Q such that p + q = n. 

By decomposing < Af,Bf > ep we get: 

< Af,Bf > ep — < Af,Bf >ep= — < A^ 1 , i?^ > ep 
+f(a(p),b(q)).g(t a(p) ,t b{q) )+ < Af~ 1 ,Bf > ep 

Since f{a{p),b(q)).g(t a ^,t b ^) > and the proposition 
is true by inductive hypothesis at rank n, we get that 

< A\, B\ > ep - < A\,Bf l > ep ) > 0. By induction the 
proposition is proved. 

Let us show by induction on the length of the time 
series the positive definiteness of < ., . > ep . 
At rank we have < > ep = £ = 0. At rank 

1, let us consider any time series of length 1, A\. 

< A\,A\ > ep = f(a(l),a(l)).g(t a{1) ,t a{1) ) > by 
hypothesis on / and g. Let us suppose that the 
proposition is true at rank n > 1 and let consider any 
time series of length n + 1, A™ +1 . Then, since a = 1 and 
P — — l, we get 

+ f(a(n + l),a(n + l)).flf(*a(n+i)»*o(n+i))- 
Since < A? +1 ,A? > ep - < A?, A? > ep > 0, 

and f(a(n + l),a(n + l)).5(* (n+i)>*o(n+i)) > °/ 

< A" +1 ,A™ +1 > ep > 0, showing that the proposition 
is true at rank n + 1. By induction, the proposition is 
proved, which establishes the positive-definiteness of 

< ., . > ep since < A\, A\ > ep = only if A\ = O. 

Let us consider any A e M, and any A^, i? 9 in U* and 
show by induction onu=p + 9 that< X® A\,Bf > ep — 



X.<A\,Bf > ep : 

The proposition is true at rank n = 0. Let us suppose 
that the proposition is true at rank n > 0, i.e. for all 
r < n, and consider any pair Af, Bf of time series such 
that p + q = n + 1. 

We have: < AffiAf,.B? > ep = a. < AffiAf,B? _1 > ep +/3. < 
A®A? -1 ,5J _1 > ep +/(A®so(p),6(ff)).s(t o( p),i6( g) )+a. < 
AffiA?- 1 ,^ > ep 

Since / is linear on (5,® s,®s), and since the propo- 
sition is true by hypothesis at rank n, we get that 

< A ffi A?, Bf > ep = X.a < A^Bf 1 > ep +A./3. < 
A^,Bf~ > ep +X.f(a(p),b(q)).g(t a(p) ,t b{q) ) + X.a. < 
A\-\Bf> ep =X.<Af,Bf> ep . 

By induction, the proposition is true for any n, and we 
have proved this proposition. 

Furthermore, for any Af, Bf and C\ in U*, let us show 
by induction on n=p + q + r that < A\ © Bf, C{ > ep =< 
A\, C{ > ep + < Bf, C\ > ep . Let X( be equal to A\ © Bf. 
The proposition is obviously true at rank n = 0. Let us 
suppose that it is true up to rank n > 0, and consider 
any Af, Bf and C'[ such that p + <7 + r = n + l. 

Three cases need then to be considered: 

1) if X^ 1 = A?' 1 © Bf' 1 , then t a{p) = t b(q) = t and 

< A* © B?, C[ > ep = a. < © Bf,C\- x > ep + 

p.KAr^Bf- 1 ,^- 1 > ep + 

f((a(p) + b(q)),c(r)).g(t,t c{r} )+ 

a. < Ar 1 ©^- 1 ,^ > ep . 
Since / is linear on (S, ©s,©s), and the proposition 
true at rank n, we get the result. 

2) if AT^ 1 =A\® Bf 1 , then t a(p) < t b[q) = t and 

< A\ © Bi 9 ,C[ > ep = a. < >4f © S^, C[ _1 > ep < 
Afffi^r^CT 1 >e P +/(6(?)»c(r)).s(t,t c(r) ) + a. < 
Af © B^ _1 ,C[ > ep . Having a = 1 and /3 = -1 with 
the proposition supposed to be true at rank n we get 
the result. 

3) if Xf 1 = Af" 1 © Bf 1 , we proceed similarly to case 
2). 

Thus the proposition is true at rank n + 1, and by in- 
duction the proposition is true for all n. This establishes 
the linearity of < ., . > ep . 

This ends the proof of the converse implication and 
theorem 12.11 is therefore established □ 

The existence of functions / and g entering into the 
definition of < ., . > ep and satisfying the conditions 
allowing for the construction of an inner product on 
(U*, ffi, ©) is ensured by the following proposition: 

Proposition 2.2. The functions f : S 2 — > M defined as 
f(a,b) =< a,b >s where < .,. >$ is an inner product 
on {S,®s,®s) and g : T 2 -> R defined as f(t a ,t b ) = 
e -v-d(t a ,t b ) ^ wnere d f s a distance defined on T 2 and v e K + , 

satisfy the conditions required to construct an elastic inner 
product on (U*, ffi, ffi). 

The proof of Prop l2.2l is obvious. This proposition 
establishes the existence of ep inner products, that we 
will denote eip (Time Elastic Inner Product). An eip as 
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thus the following structure: 



<- A P B 9 — 

^-^ 1 l3- tJ l ^ eip — 

<r A v ~ x R g > 

^ ^lj , ^ eip 

- < A?' 1 



E 



,5 



9-1 



>e 



(2) 



g(ta(p),tb(q))- < (d(p),b(q) >eip(S) 



eip 



With the initialization: VA E U*, < A, CI > eip =< 
CI, A >ei P =< CI, CI >ei P = £/ where £ is a real constant 
(typically we set £ = 0), and £1 is the null sequence, 
with the convention that A^ = CI whenever i > j. 

Note that < ., . >s can be chosen to be a eip as well, 
in the case where a second time elastic dimension is 
required. This leads naturally to recursive definitions 
for ep and eip. 



Proposition 2.3. For any n e N, and any discrete subset 
T = {ti,t2, ■■■ ,t n } C E, let U n .R,T be the set of all time 
series defined on R x T whose lengths are n (the time series 
in U ra> K,T are considered to be uniformly sampled). Then, the 
eip on U„,b constructed from the functions f and g defined 
in Prop. \2.2\ tends towards the Euclidean inner product when 
v — > oo if S is an Euclidean space and < a,b >$ is the 
Euclidean inner product defined on S. 

The proof of Prop l2.3l is straightforward and is 
omitted. This proposition shows that eip generalizes the 
classical Euclidean inner product. 



2.7 Algorithmic complexity 

The general complexity associated to the calculation of 
the eip of two time series of lengths p and q as specified 
by Eq|2]is 0(p ■ q). Basically this is the same complexity 
as that required for the calculation of the complete dy- 
namic programming solutions such as the Dynamic Time 
Warping (DTW) [20| [ 13 ] algorithm evaluated on these 
two time series. Nevertheless, as discussed in section [3731 
Prop . l3~2l allows for efficient implementations of retrieval 
process (in linear time complexity) once a straightfor- 
ward indexing phase has been implemented. This result 
is demonstrated in practice the experimentation section 
(sec. 1). 

3 Some preliminary applications 

We present in the following sections some applications 
to highlight the properties of Elastic Inner Product Vector 
Spaces (EIPVS). 

3.1 Distance in EIPVS 

The following proposition provides U* with a norm and 
a distance, both induced by a eip. 



Proposition 3.1. For all A\ e U*, and any < .,. > 

defined on (U*,©,®) y/< A\,A\ > eip is a norm on U* 



For all pair (A\,B\) € (U*) 2 , and any eip defined on 

(U*,©,<g)), 

S eip ( A*, B\) = ^<A\® (-1. g Bp, A\ © (-1. ® B\) > eip 
= yj < A^, A^ > e ip + < B\, B\ > e ip —2- < A p , B\ > e i P 
defines a distance metric on U*. 

The proof of Prop. 13.11 is straightforward and is 
omitted. 



3.2 Orthogonalization in EIPVS 

To exemplify the effect of elasticity in EIPVS, we give 
below the result of the Gram-Schmidt orthogonalization 
algorithm for two families of independent univariate 
time series. The first family is composed of uniformly 
sampled time series having increasing lengths. The 
second family (a sine-cosine basis) is composed of 
uniformly sampled time series, all of which have the 
same length. 

The tests which are described in the next sections 
were performed on a set U* of discrete time series 
whose elements are defined on (R — {0} x [0; l]) 2 using 
the following eip: 



^ -^-1; ^1 ^ eip — 

<r- A p R 9_1 -> 

^ ^1 ) -^1 ^ eip 
— < A P , B\ >eip + 

a(p)b(q) ■ e - "-l to C*>> -*!>(«> I a 



E 



(3) 




eip 



Fig. 2. Result of the orthogonalization of the family of 
length time series defined in Eq|4] using v = .01: except 
for the first spike located at time 0, each original spike is 
replaced by two spikes, one negative the other positive. 
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missing timestamps. Each time series is thus represented 
by a finite dimensional vector, let say in K™. 

Proposition 3.2. Given a symmetric and strictly positive 
function g, let consider the so-called n x n symmetric elastic 
matrix in W 1 defined as: 



Fig. 3. Orthogonalization of the sine-cosine basis using 
v = .01: the waves are slightly deformed jointly in am- 
plitude and in frequency. For readability of the figure, we 
have presented the 8 first components 



3.2. 1 Orthogonalization of an independent family of time 
series with increasing lengths 

The family of time series we are considering is composed 
of 11 time series uniformly sampled, whose lengths are 
11 samples: 



(1,0) 

(e,0)(l,l/10) 
(e,0)(e,0)(l,l/10) 

(e,0)(e,l/10)(e, 2/10)- ••(!,!) 



(4) 



Since, the zero value cannot be used for the space 
dimension, we replaced it by e, which is the smallest non 
zero positive real for our test machine (i.e. 2~ 1074 ). The 
result of the Gram-Schmidt orthogonalization process 
using v = .01 on this basis is given in Fig|2] 

3.2.2 Orthogonalization of a sine-cosine basis 

An orthonormal family of discrete sine-cosine functions 
is not anymore orthogonal in a EIPVS. The result of the 
Gram-Schmidt orthogonalization process using v = .01 
when applied on a discrete sine-cosine basis is given in 
FigEJ in which only the 8 first components are displayed. 
The lengths of the waves are 128 samples. 

3.3 Indexing for fast retrieval in time series data 
bases 

Prop 12.31 shows how the elastic inner product gener- 
alizes the Euclidean inner product when time series 
are embedded into a finite dimensional vector space 
(one dimension per timestamps). We consider in this 
subsection such embeddings with the convention that if 
a time series has no value for a given timestamps a zero 
value is added on the dimension corresponding to the 



E = 



g(ti,ti) g(h,t 2 ) g(h,t 3 ) 

g(h,ti) g{t 2 ,t 2 ) g(t 2 ,t 3 ) 

g{h,h) g{h,t 2 ) g(t 3 ,t 3 ) 

g{tn,h) g(t ni t 2 ) g(t n ,t 3 ) 



g(h,t n ) 
g{t2,t n ) 
g{h,t n ) 



and two time series A™ and B™ represented by two vectors 
in R n . Then the elastic inner product defined recursively as: 



An TDTl \ 

< A-^ , B™ > e .ip 

^ ) -<A n 1 -\Br 1 >e 



(5) 



a(n)b(n) ■ g{t a { n ),h(n)) 



identifies to the following matrix products [A™] T E [£?"] 

This result is quite interesting in the scope of time 
series information retrieval especially when addressing 
very large databases. Let D = {B(i)" }i—x... m be a time se- 
ries database of size m, and consider the indexing phase 
that consists in constructing D E = {E [i?(i)"]} 4= i... m . 
Then the retrieval of all the time series in D elastically 
similar to a given request A" will require the compu- 
tation of < A%,B(i)T} 
reduces to evaluating: 



> eip , for i € {1, 



i}, which 



<A n l ,B(i) n l > eip = [A r {Y [EB(i) r { 



(6) 



for i 6 {1, • • • , m} which is done in linear complexity. 
Basically this construction breaks the quadratic complex- 
ity of the eip at retrieval stage and meets the same linear 
complexity as the classical Euclidean inner product. 

Moreover one can notice that the eip defined by 
Eq|5]has it continuous time equivalent that expresses as: 
< a,b>= Jfa(t)b(T)g(t,T)dtdT 

3.4 Kernel methods in EIPVS 

A wide range of literature exists on kernel theory, among 
which [3 1, [15 1 and [16 [ present some large syntheses of 
major results. We give hereinafter basic definitions and 
immediate results regarding kernel construction based 
on eip. 

Definition 3.1. A kernel on a non empty set U refers to 
a complex (or real) valued symmetric function ip(x,y) : 
U x U -)• C (or R). 

Definition 3.2. Let U be a non empty set. A function 
ip : U x U — > C is called a positive (resp. negative) 
definite kernel if and only if it is Hermitian (i.e. 
ip(x,y) — cp(y,x) where the overline stands for 
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the conjugate number) for all x and y in U and 

12ij=i c iCj'P( x i> x 3) ^ (resp. Z)Ij=i CiCj(p(xi,Xj) < 0), 
for all n in N, (xi,X2, — , x n ) € J7 n and (ci, C2, c n ) e C". 

Definition 3.3. Let f7 be a non empty set. A function 
(p : U x U — > C is called a conditionally positive 
(resp. conditionally negative) definite kernel if and 
only if it is Hermitian (i.e. <p(x,y) — (p(y,x) for 
all x and y in U) and Y17,j=l CiCjPfaii x j) ^ u 
(resp. j=i c iCjf(xi,Xj) < 0), for all n > 2 in N, 
{xi,X2, —,x n ) e U n and (ci, c 2 , c„) £ C" with 

In the last two above definitions, it is easy to show that 
it is sufficient to consider mutually different elements 
in U, i.e. collections of distinct elements xi, x%, x n . 

Definition 3.4. A positive (resp. negative) definite kernel 
defined on a finite set U is also called a positive (resp. 
negative) semidefinite matrix. Similarly a positive (resp. 
negative) conditionally definite kernel defined on a finite 
set is also called a positive (resp. negative) conditionally 
semidefinite matrix. 

3.4. 1 Definiteness of eip based kernel 
Proposition 3.3. A eip is a positive definite kernel. 

The proof of Prop. l3~3l is straightforward and is omit- 
ted. The consequence is that numerous definite (positive 
or negative) kernels can be derived from a eip; among 
other immediate derivations it can be stated that: 

. (< ., . > e i P ) p is positive definite for all p e N 
(polynomial kernel). 

• S e i p defined by Prop B.ll is negative definite. 

• er v ' b " r is positive definite for all v > 0. 

> e _l/ ' (5 " i >) P is positive definite for all v > and any 
< p < 2. 

• e < A > B >eip i s positive definite. 

Some experimentations on Support vector Machine 
involving elastic kernels are reported in Sec|H 

3.5 Elastic Cosine similarity in EIPVS, with appli- 
cation to symbolic (e.g. textual) information retrieval 

Similarly to the definition of the cosine of two vectors 
in Euclidean space, we define the elastic cosine of two 
sequences by using any ep that satisfies the conditions 
of theorem I2.ll 

Definition 3.5. Given two sequences, A and B, the elastic 
cosine similarity of these two sequences is given using a 
time elastic inner product < X, Y > e and the induced 
norm ||X|| e = y/< X, X > e as 
similarity = eCOS{9) = 

In the case of textual or sequential data information re- 
trieval, namely text matching or sequence matching, the 



timestamps variable coincides with the index of words 
into the text or sequence, and the spatial dimensions 
encode the words or symbol into a given dictionary or 
alphabet. For instance, in text mining, each word can 
be represented using a vector whose dimension is the 
size of the set of concepts (or senses) that covers the 
conceptual domain associated to the dictionary, and each 
coordinate value, that is selected into [0; 1], encodes the 
degree of presence of the concept or senses into the 
considered word. In any case, the elastic cosine similarity 
measure takes value into [0; 1], indicating the lowest 
possible similarity value between two sequential data 
and 1 the greatest possible similarity value between two 
sequential data. The elastic cosine similarity takes into 
account the order of occurrence of the words or symbols 
into the sequential data which could be an advantage 
compared to the Euclidean cosine measure that does not 
cope at all with the words or symbols ordering. 

Let us consider the following elastic inner product 
dedicated to text matching. In the following definition, 
A\ and B\ are sequences of words that represent textual 
content. 

Definition 3.6. 

!< A 1 , B\ > e ip tm 
- <A p 1 -\Br 1 >e iPtm + (7) 
e -»-\Up>-tx q >\ 2 5(a(p),b(q)) 
< A P ,Bl >eip tm 

where a(p) and b(q) are vectors whose coordinates 
identify words with weightings, 6(a,b) =< a,b > is the 
Euclidan inner product, and v a time stiffness parameter. 

Proposition 3.4. For v = and S redefined as S(a, b) = 1 
if a — b, otherwise, the elastic inner product defined 
in Eq\7\ coincides with the euclidean inner product between 
two vectors whose coordinates correspond to term frequencies 
observed into the A\ and B\ text sequences. If, we change 
the definition of S by the S(a, b) = {IDF(a)) 2 if a = b, 
otherwise, where IDF(a) is the inverse document frequency 
of term a into the considered collection, then for v = 0, 
< A\,B\ >ei Ptm coincides with the euclidean inner product 
between two vectors whose coordinates correspond to the TF- 
IDF (term frequency times the inverse document frequency) 
of terms occurring into the A\ and B\ text sequences. 

The proof of proposition 13.41 is straightforward an is 
omitted. 

Thus, the elastic cosine measure derived from the elas- 
tic inner product defined by EqJ7| generalizes somehow 
the cosine measure implemented in the vector space 
model fT4) and commonly used in the text information 
retrieval community 

To exemplify the behavior of the elastic cosine similar- 
ity on sequential data, we consider the four sequences 
depicted in EqJH] The variations of the elastic cosine as 
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Fig. 4. Elastic cosine as a function of v evaluated on pairwise sequences selected in {A = [abababab],B = 
[aaaabbbb] , C [bbbbaaaa] , D = [bbaa]}. The timestamps correspond to the index of occurrence of the symbol into the 
sequence (i = 1,2, • •• ,). 



a function of v evaluated on pairwise sequences are 
reported in Figj4] 

A = [abababab] 

B = [aaaabbbb] . . 

C = [bbbbaaaa] ( ' 

D = [bbaa] 

In FigH] the extreme left part of the curves, charac- 
terized with high v values, corresponds, when applied 
on sequences having the same length and represented 
by vectors, to the cosine similarity constructed with the 
Euclidean inner product. The right part of the curves, 
characterized with very low v values, corresponds to the 
cosine similarity evaluated using the tf(-idf) vector space 
model. The center part of the figure, characterized with 
medium v values shows that elasticity allows to better 
discriminating between the sequences. 

4 Experimentations 

4.1 Time series classification 

We empirically evaluate the effectiveness of the distance 
induced by an elastic inner product comparatively 
to the Euclidean and the Dynamic Time Warping 
distances using some classification tasks on a set of 
time series coming from quite different application 
fields. The classification task we have considered 
consists of assigning one of the possible categories to 
an unknown time series for the 20 data sets available 
at the UCR repository (HJ. As time is not explicitly 
given for these datasets, we used the index value of 
the samples as the timestamps for the whole experiment. 

For each dataset, a training subset (TRAIN) is defined 
as well as an independent testing subset (TEST). We use 
the training sets to train two kinds of classifiers: 



• the first one is a first near neighbor (1-NN) classifier: 
first we select a training data set containing time 
series for which the correct category is known. To 
assign a category to an unknown time series selected 
from a testing data set (different from the train set), 
we select its nearest neighbor (in the sense of a 
distance or similarity measure) within the training 
data set, then, assign the associated category to its 
nearest neighbor. For that experiment, a leave one 
out procedure is performed on the training dataset 
to optimize the meta parameter v of the considered 
elastic distance. 

• the second one is a SVM classifier J4), Il9l 
configured with a Gaussian RBF kernel whose 
parameters are C > 0, a trade-off between 
regularization and constraint violation and a that 
determines the width of the Gaussian function. To 
determine the C and a hyper parameter values, we 
adopt a 5-folded cross-validation method on each 
training subset. According to this procedure, given 
a predefined training set TRAIN and a test set TEST, 
we adapt the meta parameters based on the training 
set TRAIN: we first divide TRAIN into 5 stratified 
subsets TRAINi,TRAIN 2 , ■ ■ ■ , TRAIN 5 ; then for 
each subset TRAINi we use it as a new test set, 
and regard (TRAIN - TRAINi) as a new training 
set; Based on the average error rate obtained on the 
five classification tasks, the optimal values of meta 
parameters are selected as the ones leading to the 
minimal average error rate. For the elastic kernel, 
the meta parameter v is also optimized using this 
5-folded cross-validation method performed on the 
training datasets. 

The classification error rates are then estimated on 
the TEST datasets on the basis of the parameter values 
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optimized on the TRAIN datasets. We have used the 
LIBSVM library (6J to implement the SVM classifiers. 

We tested the time elastic inner product 
< A, B > eip (Eqj3)l. Precisely, we used the time- 



warp distance induced by < 

5eip{A,B) 



<B,B> e 



[<A-B,A- 
-2. < A.B > 



eip) 



A,B 

sl/2 



> 



eip/ 



basically 



(< A, A > e 



To speed up the computation at exploitation stage, we 
have used the construction proposed in Sec l3.3I Prop l3.2l 
with the following elastic Matrix E, where v > : 



E 



1.0 

e -i/|t 2 -ti| 2 

e -H«3-*i| 2 
p -v\t n -ti\ 2 



-v\t 1 — t 2 \ 

1.0 e 

-H*3-t 2 | 2 

-v\t n -t 2 \ 2 - 



-v\ti-t 3 \ 
-v\t 2 -t 3 \ 2 

1.0 

-v\t n -t 3 \ 2 



£ -u\ti~t n \ 2 
e -v\t 2 ~t n \ 2 
e -v\t 3 -t n \ 2 

1.0 

are thus con- 



The databases D E = {EB(i) 7 l} l= i. 
structed off-line from the TRAIN datasets, once the op- 
timization procedure of the v parameter has been com- 
pleted. De is then exploited on-line by the 1-NN and 
SVM classifiers. 



4.1.1 

Seip 



Meta parameters 

is characterized by 



the meta parameter v (the 
stiffness parameter) that is optimized for each dataset 
on the train data by minimizing the classification error 
rate of a first near neighbor classifier. For this kernel, v 
is selected in {100, 10, 1, .1, .01, le - 5, 0}. 

To explore the potential benefits of an elastic inner 
product against the Euclidean inner product, we also 
tested the Euclidean Distance 5 e d which, as already 
stated, is the limit when v — >• oo of S e i P - 

The kernels exploited by the SVM classifiers are the 
Gaussian kernels K eip (A,B) = e -^, P (A,B) 2 /(2-a 2 ) and 
K ed (A, B) = e -^(AB) 2 /(2-a 2 )_ The meta p arameter q is 
selected from the discrete set {2" 8 , 2" 7 , 1, 2, 2 10 }, 
and cr 2 from {2~ 5 , 2~ 4 , 1, 2, 2 10 }. 

Table Q] gives for each data set and each tested kernels 
(K e d and K eip ) the corresponding optimized values of 
the meta parameters. 

According to the classification results, this experiment 
shows that the distance induced by the elastic inner 
product 5 e ip is significantly more effective for the 
considered tasks comparatively to the Euclidean 
distance. It exhibits, on average, the lowest error rates 
for most of the tested datasets and for both the 1-NN 
and SVM classifiers, as shown in Table |2] and Figures [5] 
and|U The stiffness parameter v in S e i P seems to play a 
significant role in these classification tasks, and this for 
a quite large majority of data sets. 



Only one dataset, yoga, is better classified by the 1-NN 
5 e d classifier on the test data, although the error rate 
on the train data is lower for the 1-NN 5 e i P classifier. 
For the SVM classifiers, only two datasets, Face (all) 
and Coffee, are significantly better classified on the test 
data by SVM K e d classifiers. Nevertheless, for these two 
datasets, K eip reaches a better (or best) score on the train 
data. We are facing here the trade-off between learning 
and generalization capabilities. The meta parameter v is 
selected such as to minimized the classification error on 
the train data. If this strategy is on average a winning 
strategy, some datasets show that it does not always lead 
to a good trade-off, this is the case for Face (all) and yoga 
datasets. 

However, Sdtw based classifiers outperform & eip based 
classifiers on a majority of datasets. The average ranking 
of the classifiers shows clearly that S e i P ranks in between 
S e d and Sdtw S e i P can be considered as a compromise 
between the linear Euclidean distance and the non- 
linear Sdtw It maintains a low computational cost and 
nevertheless compensates some of the limitations of the 
Euclidean distance that it is very sensitive to distortions 
in the time axis. It should be noted that S e i P can cope 
with sample substitution, deletion and insertion as well 
as Sdtw I n addition, S e i p can deal with sample permuta- 
tions while Sdtw cannot. 

4.2 Sequence classification 

We report here a protein classification experiment carried 
out using the Protein Classification Benchmark Collec- 
tion (PCBC) [18J [1|. This benchmark contains structural 
and functional annotations of proteins. The two datasets 
that we have exploited, SCOP95 and CATH95, are avail- 
able at http://hydra.icgeb.trieste.it/benchmark 

The entries of the SCOP95 dataset are characterized 
by sequences with variable lengths and relatively little 
sequence similarity (less than 95% sequence identity) 
between the protein families. 

The CATH95 dataset contains near-identical protein 
families of variable lengths in which the proteins have 
a high sequence similarity (more than 95% sequence 
identity). 

Basically, the considered classification tasks involve 
protein domain sequence and structure comparisons at 
various levels of the structural hierarchies. We have 
considered the following 14 PCB subsets: 

. PCB00001 SCOP95_Superfamily_Family 

. PCB00002 SCOP95_Superfamily_5fold 

. PCB00003 SCO P95_Fold_Super family 

. PCB00004 SCOP95_Fold_5fold 

. PCB00005 SCOP95_Class_Fold 

. PCB00006 SCOP95_ClassJ>fold 

. PCB00007 CATH95_Homology_Similarity 

. PCB00008 CATH95_HomologyJ>fold 

. PCB00009 CATH95_Topology_Homology 

. PCB00010 C 'AJ 'H95 Topology _5 fold 

. PCB00011 CATH95 A rchitecture_Topology 
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TABLE 1 

Dataset sizes and meta parameters used in conjunction with K ed , K eip and K dtw kernels 



DATASET 


length #class | #train #test 


K ed : C, cr 




Kdtw C, ® 


Synthetic control 


60|6|300 


300 


1.0;0.125 


.1;.25;.0625 


8.0;4.0 


Gun-Point 


150 


2 


50 


150 


256;.5 


0.01;256;.5 


16.0;0.0312 


CBF 


128 


3 


30 


900 


8;1.0 


.001;4.0;.0312 


1.0;1.0 


Face (all) 


131 114|560 11690 


4;0.5 


.1;8.0;.5 


2.0;0.25 


OSU Leaf 


427|6|200|242 


2;0.125 


.1;4;0.125 


4.0;0.062 


Swedish Leaf 


128|15|500|625 


128;0.125 


10;8;.0625 


4.0;0.031 


50 Words 


270 1 50 1450 455 


32;0.5 


0.01;32;0.5 


4.0;0.062 


Trace 


275|4|100|100 


8;0.0156 


.001;256;.0625 


4;0.25 


Two Patterns 


128 |4|1000|4000 


4.0;0.25 


.01;1;.0312 


0.25,0.125 


Wafer 


152|2|1000|6174 


4.0;0.5 


0.01;32;.5 


1.0;0.016 


face (four) 


350|4|24|88 


8.0;2.0 


.01;16;2 


16;0.5 


Ligthing2 


637|2 60 61 


2.0;0.125 


.001;128;2 


2.0;0.031 


Ligthing7 


319|7 70 73 


32.0;256.0 


.1;16;.5 


4;0.25 


ECG200 


96|2|100|100 


8.0;0.25 


0, 1024, .0625 


2;0.62 


Adiac 


176|37|390|391 


1024.0;0.125 


10;256.0;.0312 


16;0.0039 


Yoga 


426 12|300 13000 


64.0;0.125 


1;32;.0625 


4;0.008 


Fish 


463|7|175|175 


64.0;1.0 


.01;256;1 


8;0.016 


Coffee 


286|2|28|28 


128.0;4.0 


.01;1024;4 


8;0.062 


OliveOil 


570|4 30 30 


2.0;0.125 


.01;64;2 


2;0.125 


Beef 


470|530 30 


128.0;4.0 


0;64;.5 


16;0.016 



TABLE 2 

Comparative study using the UCR datasets: classification error rates (in %) obtained using the first near neighbor 
(1 -NN) classification rule and a SVM classifier for the K ed , K eip and K eip kernels. Two scores are given S1 |S2: the 
first one, S1 , is evaluated on the training data, while the second one, S2, is evaluated on the test data. For the each 
classification methods (1 -NN and SVM) the rank of each classifier is given in parenthesis ((1 ): best classifier, (2): 2nd 

best classifier, (3): 3rd best classifier 



DATASET 


1-NN 8 ed 


1-NN S clp 


1-NN S dtw 


SVM K ed 


SVM K eip 


SVM K dtw 


Synthetic control 


9(3)|12(3) 


.67(1)|1(2) 


1.0(2)|0.67(1) 


3(3)|2.33(3) 


.33(2)|.67(1) 


0(1)|1(2) 


Gun-Point 


4.0(1)|8.67(1) 


4.0(1)|8.67(1) 


18.36(3)|9.3(3) 


4.0(3)|6.0(3) 


2.0(2)|2.67(2) 


0(1)|1.33(1) 


CBF 


16.67(3)|14.78(3) 


3.33(2)|4.22(2) 


0(1)|0.33(1) 


3.33(2)|10.89(3) 


3.33(1)|5(1) 


3.33(1)|5.44(2) 


Face (all) 


11.25(3)|28.64(3) 


7.5(2)|26.33(2) 


6.8(1)|19.23(1) 


9.82(3)|16.45(3) 


6.25(2)|24.91(2) 


.54(1)|16.98(1) 


OSU Leaf 


37.0(2)|48.35(2) 


37(2)|48.35(2) 


33.17(1)|40.9(1) 


35(3)|44.21(2) 


34.5(2)|44.21(2) 


20(1)|23.55(1) 


Swedish Leaf 


26.6(3)|21.12(3) 


24.4(1)|20.96(2) 


24.65(2)|20.8(1) 


15(2)|8.64(2) 


15(2)|8.64(2) 


7(1)|5.6(1) 


50 Words 


34.47(3)|36.32(3) 


32.2(1)|32.73(2) 


33.18(2)|31(1) 


33.56(3)|30.99(3) 


31.78(2)|29.67(2) 


15.21(1)|17.58(1) 


Trace 


18(3)|24(2) 


16(2)|24(2) 


0(1)|0(1) 


9(3)|19(3) 


1(2)|7(2) 


0(1)|2(1) 


Two Patterns 


8.5(3)|9.32(3) 


4.3(2) |3.62(2) 


0(1)|0(1) 


8.6(3)|7.45(3) 


5.5(2)|3.52(2) 


0(1)|0(1) 


Wafer 


0.7(2)|0.45(2) 


.5(1)|.42(1) 


1.4(3)|2.01(3) 


.7(3) 


.7(3) 


.2(2)|.68(2) 


0(1)|0.39(1) 


face (four) 


37.5(3)|21.59(3) 


33.33(2)|19.31(2) 


26.09(1)|17.05(1) 


20.84(3) 


19.31(3) 


16.67(2)|13.63(2) 


8.33(1)|5.68(1) 


Ligthing2 


25.0(3)|24.59(3) 


20(2)|16.39(2) 


13.56(1)|13.1(1) 


21.77(3) 


31.14(3) 


20(2)|26.22(2) 


8.33(1)|19.67(1) 


Ligthing7 


35.71(3)|42.47(3) 


30.0(1)|32.87(2) 


33.33(2)|27.4(1) 


37.14(3) 


36.98(3) 


34.29(2)|35.61(2) 


17.14(1)|16.43(1) 


ECG200 


14.0(2)|12.0(2) 


1.0(1)|2.0(1) 


23.23(3)|23(3) 


8.0(3) 


9.0(2) 


3.0(1)|7.0(1) 


7(2)|13(3) 


Adiac 


41.28(3)|38.87(1) 


39.59(1)|38.87(1) 


40.62(2)|39.64(3) 


26.67(3) 


24.04(1) 


25.13(2)|24.04(1) 


24.61(1)|25.32(3) 


Yoga 


22.67(3)|16.9(2) 


21.67(2) 22.26(3) 


16.37(1)|16.4(1) 


17.66(3) 


14.43(2) 


15.33(2)|14.4(2) 


11(1)111.2(1) 


Fish 


24.0(1)|21.71(2) 


24.0(1)|21.71(2) 


26.44(3) 


16.57(1) 


14.86(3) 


13.14(3) 


13.14(2)|12.57(2) 


6.86(1) 


4.57(1) 


Coffee 


21.43(2)|25.0(2) 


21.43(2)|25.0(2) 


14.81(1) 


17.86(1) 


0(1) 


0(1) 


0(1)|7.14(2) 


10.71(3) 


17.86(3) 


OliveOil 


13.33(1) 


13.33(1) 


13.33(1) 


13.33(1) 


13.79(3) 


13.33(1) 


10.0(1) 


10.0(1) 


10.0(1)|10.0(1) 


13.33(3) 


16.67(3) 


Beef 


46.67(1) 


46.67(1) 


46.67(1) 


46.67(1) 


55.17(3)|50(3) 


37.67(2)|30(1) 


37.67(2)|30(1) 


32.14(1) 


42.85(3) 


Average Rank 


(2.4)|(2.25) 


(1.45) 


(1.75) 


(1.85)|(1.5) 


(2.65)|(2.4) 


(1.8)|(1.7) 


(1.25)|(1.6) 



. PCB00012 CATH95_ArchitectureJ>fold 
. PCB00013 CATH95_Class_Architecture 
. PCB00014 CATm$_ClassJ>fold 

We evaluate the elastic cosine similarity based on the 
eip defined for symbolic sequences (Def l3.61 Eqf7]l com- 
paratively to five other similarity measures commonly 
used in Bioinformatics: 

> BLAST [2J: the Basic Local Alignment Search Tool 
is a very popular family of fast heuristic search 
methods used for finding similar regions between 
two or more nucleotides or amino acids. 



• SW [17]: The Smith Waterman algorithm is used for 
performing local sequence alignment, for determin- 
ing similar regions between two nucleotide or pro- 
tein sequences. Instead of looking at the sequence 
globally as NW does, the Smith Waterman algorithm 
compares subsequences of all possible lengths. 

• NW [11 J : The Needleman Wunsch algorithm per- 
forms a maximal global alignment of two strings. It 
is commonly used in bioinformatics to align protein 
sequences or nucleotides. 

• LA kernel fl2l : The Local Alignment kernel is used 
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1-NNS sd v.s. 1-NN S ei[ 




10 20 30 40 50 



1-NN G sd 

Fig. 5. Comparison of error rates (in %) between two 1-NN classifiers based on the Euclidean Distance (1-NN S ed ), 
and the distance induced by a time-warp inner product (1-NN 5 eip ). The straight line has a slope of 1.0 and dots 
correspond, for the pair of classifiers, to the error rates on the train (star) or test (circle) data sets. A dot below (resp. 
above) the straight line indicates that distance S eip has a lower (resp. higher) error rate than distance S ed . Black 'x' 
indicate error rates evaluated on the training data, while the red squares indicate error rates evaluated on the test 
data. 



SVM K^ d v.s. SVM K^ ip 




SVM K^ d 



Fig. 6. Comparison of error rates (in %) between two SVM classifiers, the first one based on the Euclidean Distance 
Gaussian kernel (SVM K ed ), and the second one based on a Gaussian kernel induced by a time-warp inner product 
(SVM Keip). The straight line has a slope of 1 .0 and dots correspond, for the pair of classifiers, to the error rates on 
the train (star) or test (circle) data sets. A dot below (resp. above) the straight line indicates that SVM K eip has a lower 
(resp. higher) error rate than distance SVM K ed . Black 'x' indicate error rates evaluated on the training data, while the 
red squares indicate error rates evaluated on the test data. 



to detect local alignment between strings by con- SW algorithm. 

volving simple basic kernels. Its construction mimic • PRIDE |5|: The PRIDE score is estimated as the 
the local alignment scoring schemes proposed in the PRobability of IDEntity between two protein 3D 
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TABLE 3 

Comparative study using the PCBC protein datasets available at http://hydra.icgeb.trieste.it/benchmark The 
assessment measure is the average AUC values based on the ROC curves obtained by the first near neighbor 
classification rules when BLAST, SW, NW, LA, PRIDE and eCOS(^=1 ,.1 ,.05, .01 , .001 ) similarity measures are used. 

Best average AUC values are in boldface characters. 



INN 


BLAST 


SW 


NW 


LA 


PRIDE 


(v - 1) 
— 1} 


(u - 1) 


(y = .05) 


(u — 01 1 


(u = .001) 


(Best) 


SCOP95_supfam_fam_l 




/U.U1 


S3 ^7 
OO.DZ. 


83 


/ O./O 


on 1Q 


77 7ft 
/ / ./\J 


81 QQ 

01.77 




83 £3 
OJ.OO 


80 7A 


83 A3 
OJ.OO 


SCOP95_supfam_fam_kfold. 


2 


94.87 


97.74 


97.73 


96.71 


97 '.41 


91.02 


94.84 


95.02 


94.15 


91.44 


95.02 


SCOP95_fold_supfam_3 




56.87 


64.82 


67.47 


58.38 


84.72 


75.20 


79.27 


79.09 


75.71 


69.89 


79.27 


SCOP95 fold supfam kfold 


.4 


90.30 


93.77 


94.54 


91.38 


95.82 


88.93 


93.40 


93.32 


90.67 


85.79 


93.4 


SCOP95 class fold 5 




60.18 


61.04 


62.90 


56.90 


78.27 


59.25 


65.65 


67.13 


67.23 


64.45 


67.23 


SCOP95_class_fold_kfold_6 




92.23 


92.14 


92.85 


88.28 


93.24 


80.43 


88.38 


88.76 


85.32 


79.81 


88.76 


CATH95 H S 7 




97.90 


99.20 


99.29 


98.81 


95.47 


95.42 


97.03 


96.84 


95.78 


91.81 


97.03 


CATH95 H S kfold 8 




96.63 


98.30 


98.32 


97.80 


95.02 


95.10 


97.28 


97.22 


96.36 


93.67 


97.28 


CATH95_T_H_9 




57.33 


64.75 


66.97 


61.44 


76.71 


73.25 


78.57 


79.12 


77.38 


70.20 


79.12 


CATH95 T H kfold 10 




89.76 


92.60 


93.14 


90.66 


90.96 


88.98 


93.02 


92.78 


90.28 


84.25 


93.02 


CATH95_A_T_11 




54.52 


56.82 


57.97 


54.12 


67.37 


57.78 


61.63 


62.09 


61.33 


59.70 


62.09 


CATH95 A T kfold 12 




90.21 


91.28 


91.77 


88.91 


84.89 


82.44 


88.28 


88.14 


84.96 


80.09 


88.28 


CATH95 C A 13 




64.81 


71.28 


69.87 


69.17 


55.72 


37.74 


46.33 


49.79 


62.61 


63.63 


63.63 


CATH95 C A kfold 14 




90.80 


89.74 


90.51 


86.21 


80.63 


76.75 


83.29 


83.94 


80.57 


75.71 


83.94 


Average 


79.03 


82.64 


83.38 


79.83 


84.69 


77.14 


82.07 


82.58 


81.86 


77.94 


84.00 



structures. The calculation of similarity between two 
proteins, is based on the comparison of histograms 
of the pairwise distances between C — a residues 
whose distribution is highly characteristic of protein 
folds. 

The average AUC (area under the ROC Curve) mea- 
sure is evaluated for 1-NN classifiers exploiting respec- 
tively BLAST, SW, NW, LA, PRIDE and eCOS(v) as 
alignment methods. One can notice that these datasets 
are quite well suited for global alignment since, as shown 
in table |3j the NW algorithm performs better than the 
SW algorithm. The eip structure that considers global 
alignment is thus well adapted to the task. We show on 
these experiments that for v = .05 the eCOS classifier 
in average performs significantly better than BLAST 
heuristics [2] and LA [12], the local alignment kernel 
for string. Furthermore, it performs almost as well as 
the SW and NW algorithms. The PRIDE method 
gets the best results, but it uses the tertiary structure 
of the proteins while all the other methods exploit the 
primary structure. Here again, a ranking based on eCOS 
similarity has a complexity that could be maintained 
linear at exploitation stage (i.e. when testing numerous 
sequences against massive datasets). These very positive 
results offer perspective in fast filtering of biological 
symbolic sequences. 

4.3 Experimental complexity 

To evaluate in practice the computational cost of 8 e i P , 
we compare it with two other distances, namely S e d 
(the Euclidean distance) which has a linear complexity, 
and Sdtw/ the Dynamic Time Warping distance which 
has a quadratic complexity. In addition we evaluate the 
computational cost of the indexed version of 5 e i p that we 
refer to as Si- e i p . The experiment consists in producing 
random datasets of 100 time series of increasing lengths 
(10, 100, 1000 and 10000 samples) and computing the 



100x100 distance matrices. Figure [7] gives the elapsed 
time in second, according to a logarithmic scale, for the 
four distances as a function of the length of the time 
series. When compared to Sdtw, b~ei P is evaluated very 
efficiently using the matrix computation given in Eq|6j 
although, without any off line indexing of the time series, 
S e ip cannot compete with 5 e d when the length of the 
time series increases. The 6i^ e i P curve has clearly the 
same slope than the S e d curve, showing that the off- 
line indexed version of 6 e i p is characterized with a linear 
complexity, that includes a nevertheless linear overhead 
when compared to 8 e d, mainly due to the loading of the 
index. 

5 Conclusion 

This paper has proposed what we call a family of elastic 
inner products able to cope with non-uniformly sampled 
time series of various lengths, as far as they do not con- 
tain the zero or null symbol value. These constructions 
allow one to embed any such time series in a single 
inner space, that some how generalizes the notion of 
Euclidean inner space. The recursive structure of the 
proposed construction offers the possibility to manage 
several elastic dimensions. Some applicative benefits can 
be expected in time series or sequence analysis when 
time elasticity is an issue, for instance in the field of 
numeric or symbolic sequence data mining. 

If the algorithmic complexity required to evaluate an 
elastic inner product is in general quadratic, its com- 
putation can be much more efficiently performed than 
dynamic programming algorithms. In addition we have 
shown that for some information retrieval applications 
for which embeddings of time series or symbolic se- 
quences into a finite dimensional Euclidean space is 
possible, one can break this quadratic complexity down 
to a linear complexity at exploitation time, although a 
quadratic computational cost should still be paid once 
during a preprocessing step at indexing phase. 
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Fig. 7. Elapsed time in second as a function of the time series length for 10000 distance computations and for the 
four experimented distances: 8 ed (cross symbol, plain grey line), S eip (triangle symbol plain line), 5i- eip (round symbol, 
dashed lines) and <W> (square symbol, plain line). 



The preliminary experiments we have carried out on 
some time series and symbolic sequence classification 
tasks show that embedding time series into elastic inner 
product space may brought significant accuracy im- 
provement when compared to Euclidean inner product 
space embeddings as they compensate, at least par- 
tially, the limitations of Euclidean distance which is very 
sensitive to distortions in the time axis. Although our 
experiments show that Dynamic Programming matching 
algorithms outperforms on a majority of dataset dis- 
tances that are derived from an elastic inner product, on 
some datasets such distances lead to similar accuracies . 

The experiment carried out on symbolic sequences 
alignment involves sequences of various lengths. It 
shows also some very interesting perspectives in the 
scope of fast filtering of massive data, since the accuracy 
obtained by a 1-NN symbolic elastic cosine classifier with 
a potentially linear complexity at exploitation time is 
somehow comparable to the one obtained using dynamic 
programming algorithms (NW and SW) whose complex- 
ity are quadratic when the alignment search space is not 
restricted. 

Finally, the general recursive structure of the elastic in- 
ner product opens perspectives in the processing of more 
complex data such as tree data mining for which consid- 
ering several elastic dimensions may be relevant and ef- 
ficient. The possibility to decompose complex structures 
onto sets of elastic basis vectors opens perspectives in 
various areas of application such as data compression, 
multi-dimensional scaling or matching pursuits. 
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